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Abstract. The use of statistical methods to analyze large databases of text has been 

useful to unveil patterns of human behavior and establish historical links between 
cultures and languages. In this study, we identify literary movements by treating 
books published from 1590 to 1922 as complex networks, whose metrics were analyzed 
with multivariate techniques to generate six clusters of books. The latter correspond to 
time periods coinciding with relevant literary movements over the last 5 centuries. The 
most important factor contributing to the distinction between different literary styles 
was the average shortest path length (particularly, the asymmetry of the distribution). 
Furthermore, over time there has been a trend toward larger average shortest path 
lengths, which is correlated with increased syntactic complexity, and a more uniform 
use of the words reflected in a smaller power-law coeflicient for the distribution of word 
frequency. Changes in literary style were also found to be driven by opposition to earlier 
writing styles, as revealed by the analysis performed with geometrical concepts. The 
approaches adopted here are generic and may be extended to analyze a number of 
features of languages and cultures. 
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1. Introduction 

Many findings related to language and culture issues have been made with the use of 
statistical methods to treat large amounts of texts P, El 13 H] . Recent examples are the 
analysis of millions of books [T] and the study of twitter messages, where the global 
variation of mood could be observed through textual analysis of tweets [2]. In several 
of such examples knowledge is inferred from the analysis of semantic contents in the 
texts. There are also other methods to analyze text, including cases where text is 
represented as a graph (or network) |5]. Of particular relevance was the finding that 
networks formed from texts are scale free [B] , whose topology could be analyzed leading 
to various contributions. For instance, the scale-free structure (which is analogous to 
the Zipf's Law frequency distribution [7j) of text networks emerged as a consequence of 
an optimization process for both hearer and speaker, so that the effort to transmit and 
obtain a message was minimized [8]. In addition to allowing for cultural features to be 
identified and explored, automatic analysis may be useful for real-world applications, 
such as automatic text summarization [9], machine translation [TT] . authorship 
attribution [12], information retrieval [13] and search engines [T^ . 

In this study we used topological metrics of complex networks representing text 
from 77 books dating from 1590 to 1922 in an attempt to verify changes in writing style. 
With multivariate statistical analysis of the metrics obtained, we were able to identify 
periods that correspond to major literary movements. Furthermore, we established 
which network characteristics were responsible for the changes in writing style. 

2. Modeling Texts as Complex Networks 

2.1. Pre- Processing 

The modeling process starts by removing punctuation and words that convey little 
semantic content (see the Supplementary Information (Sl)-Sec.l), such as articles and 
prepositions. Then, the remaining words are transformed into their canonical form, 
i.e. nouns and verbs are converted into the singular and infinitive forms, respectively. 
This step is performed using the MXPOST part-of-speech tagger \15\, which assists the 
resolution of ambiguities. The transformation to the canonical form (lemmatization) is 
done to cluster words referring to the same concept into a single node of the network 
despite the differences in flexion. At last, adjacent words in the written text are 
connected in the network according to the natural reading order (the left word is the 
source node and the right word is the target node). The modeling is demonstrated in 
Table [T] for the pre-processing steps, while Fig. [l] illustrates the network obtained from 
a small extract of the book Great Expectations, by Charles Dickens. 
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Table 1. Illustration of the pre-processing (removal of stopwords and punctuation 
marks) and lemmatization of the extract "My father's family name being Pirrip, and 
my Christian name Philip, my infant tongue could make of both names nothing longer 
or more explicit than Pip." obtained from the book Great Expectations, by Charles 
Dickens. 



Original 


Without stopwords 


After lemmatization 


My father's family name 


father family name 


father family name 


Pirrip, and my , 


Pirrip 


Pirrip 


Christian name Philip 


Christian name Philip 


Christian name Philip 


my infant tongue 


infant tongue 


infant tongue 


could make of both 


could make both 


can make both 


names nothing longer 


names longer 


name long 


or more explicit than Pip 


more explicit Pip 


more explicit Pip 



CHRISTIAN 




PIP EXPLICIT MORE LONG 



Figure 1. Network obtained from the extract "My father's family name being Pirrip, 
and my Christian name Philip, my infant tongue could make of both names nothing 
longer or more explicit than Pip." of the book Great Expectations, by Charles Dickens. 

2.2. Complex Networks Measurements 

Several metrics extracted from the networks were used to quantify the style of the books. 
From each local measurement (i.e., which refers to a node) we derived some quantities 
describing the distribution of the networks in order to quantify the style of whole books. 
The measurements and their corresponding distribution descriptors were chosen because 
they have been useful to quantify the style of texts in previous studies [12] • The simplest 
measurement refers to the number of nodes in the network, which corresponds to the 
size of the vocabulary used to write the piece of text analyzed. The distribution of word 
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frequency was characterized using the coefficient 7 of the frequency distribution p^: 

Pk ~ ck"^, (1) 

where c is a normahzation constant (see Fig. [2]^a) for an example of the frequency 
distribution pk of a specific book). We did not verify exphcitly whether the degree 
obeys a power-law distribution because k is proportional to the frequency of words. 
Since the word frequency follows the Zipf's Law [161 lEj; the degree is guaranteed to 
obey a power-law distributioi][|| To compute 7, we employed a technique based on 
the accumulated distribution pk (see Fig. |2](b)) described in Ref. [IB]. We also used 
the frequency of words (or equivalently the degree k of the nodes) to calculate the 
assortativity F [T^ 1^ [5T] (or degree-degree correlation) of the network as: 

r "I ^ 



(2) 



where M = 21, 90C[§]is the number of edges of the network and aij = 1 if nodes i and j 
are connected and aij = otherwise. If positive values are obtained for F, then highly 
connected nodes are usually connected to other highly connected nodes, indicating that 
there may exist regions where nodes are highly interconnected [19]. Conversely, if F 
is negative then highly connected nodes are commonly connected to little connected 
nodes. 

In addition to measurements based on the number of nodes of the network and on 
the degree, the distance between concepts was employed to characterize the structure 
of the books. This measurement, widely known in the theory of networks as average 
shortest path length / [22], is calculated from the distance dij, which represents the 
minimum cost (minimum number of edges) required to reach node j, starting from node 
i. After computing all pairs of values dij, the average shortest path length /j of each 
node i is: 

1 

iV . . 

Since k is defined for each node individually, the network is characterized by a 
distribution of k (see the distribution of k for a specific book in Fig. [2]^c)). The 
distribution was characterized quantitatively by computing the average (/) and standard 
deviation Al. Additionally, we computed the weighted average {I / J2 ^^i) J2 kik = (Iw), 
so that greater importance was given to the most frequent words in the text. The third 
moment <^(/) 

was also computed. 

I The power-law distribution was verified for all texts of the database. 

§ To avoid effects from the size of the books, for obtaining the complex network we used only the first 
M + 1 words of each book. 



li = (3) 
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Figure 2. Example of distributions of measurements for the book Great Expectations, 
by Charles Dickens. The measurements used were: (a) simple word frequency; (b) 
accumulated word frequency; (c) average shortest path length; and (d) clustering 
coefficient. The adjusted R-square found in (a) was equal to 0.9348, which confirms 
that the frequency distribution is very similar to a power law distribution. 



The last metric was the clustering coefficient (C) [22j, which quantifies the density 
of connections between the neighbors of a node i according to: 

2—ik>j>i ^ij^ik ~r djidjk i Clkidkj 

The clustering coefficient in equation [5] represents the fraction of the number of triangles 
among all possible connected sets of three nodes, and therefore < Cj < 1. Similarly to 
the average shortest path length, it is also necessary to quantitatively characterize the 
distribution of the measurement (see an example of distribution of C in Fig. [2|^d)). We 
therefore computed the average (C), the standard deviation AC, the weighted average 
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(1/ ki) kiCi = (Cw) and the third moment ^(C) to characterize the distribution. 



3. Database 



The database comprises 77 books available online at the Gutenberg project 
repository (23], whose publication date ranged from 1590 to 1922. Tables S1-S3 in 
(SI)-Sec.2 give the details of the books. The texts were represented with complex 
networks [HlinilinillllElESlESlEZlEHlISniEn], in which the edges are defined on 
the basis of co-occurrence of words (see Sec. [2]). The latter procedure has been proven 
suitable to quantify both the style and structure of texts (see e.g. Refs. [HI [26l [29] ) . The 
details of the procedures adopted to model texts as complex networks and a description 
of the measurements employed to characterize the networks are given in Section [2] 



4. Results and Discussion 



The evolution of literary styles was quantified considering the 11 measurements from 



complex networks described in Sec. 2.2 for the books from the Project Gutenberg 
The main measurements were the shortest path length [I), the clustering coefficient 
(C), the assortativity (F), the power law coefficient of the degree distribution (7) 
and the size of the vocabulary (A^). An initial, arbitrary division of the books 
in 6 intervals of 50 years, according to their publication date, led to the clusters 
shown in the Canonical Variate Analysis (CVA, see details in (SI)-Sec.3) plot in Fig. 
[3| The distinction was relatively poor, especially considering the standard variation 
ellipses [21] in the inset of the figure. Good separation was only possible when distant 
periods in time were compared, as their ellipses did not overlap. This difficulty in 
distinguishing literary movements should perhaps be expected as there is no reason 
for sharp transitions to occur only because half century marks were reached. We also 
verified the distinguishability of clusters with the Principal Component Analysis (PCA, 
see (SI)-Sec.3), but the distinction was also poor. 

In order to verify whether books from distinct publication dates could be 
distinguished at all, we adopted a systematic procedure for the partition of the dataset 
using an optimization approach. This was performed by assessing the quality of the 
clustering under the condition that books with consecutive publication dates should 
belong either to the same cluster or lie in the boundaries of consecutive clusters. 
More specifically, we varied the delimiters and number of clusters in the database and 
quantified the quality of the clustering using 2 indices, viz. the simplified silhouette 
(SWC) and the Dunn index (DN) (see (SI)-Sec.4). Good distinction of writing styles 
was obtained for 3, 4, 5, 6 and 7 clusters (see Figure SI of the SI), according to the 
two indices (SWC and DN). The best partition, which was found to be statistically 
significant (see Figure |4]), was obtained with SWC and CVA projection, leading to the 
6 clusters in Fig. [5} where there is almost no overlap among clusters, as shown in the 
inset. Most significantly, the 6 time periods inferred from this analysis coincide with 
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Figure 3. Scatter plot (CVA projection) representing the style of each book using 
6 literary styles. Each style is represented by a set of 10 books. The inset displays the 
dispersion of the literary styles. 



well-established literary movements listed in Table [2] 

Table 2. Relationship between the best clustering of writing styles the traditional 
classification of literary movements. 



Cluster Boundary Literary Boundary Literary Movement Reference 



1590 - 


1653 


1558 - 


1603 


Elizabethan era 


133J 


1664 - 


1761 


1660 - 


1798 


Neoclassicism / Enlightenment 


[32 1351 [36] 


1767- 


1793 


1660 - 


1798 


Neoclassicism / Enlightenment 


[32 [33 


1794 - 


1818 


1764 - 


1820 


Gothic fiction 


[32 [37] 


1826 - 


1906 


1830 - 


1900 


Realism 


m 


1826 - 


1906 


1865 - 


1900 


Naturalism 


[32 [3H] 


1906 - 


1922 


1890 - 


1940 


Modernism 


[32 [39] 



Other important features are inferred from Fig. |5| First, clusters for subsequent 
time periods are normally placed next to each other, indicating smooth changes in 
writing style over time. The same conclusion can be inferred from the analysis of the 
hierarchical clustering in Fig. |6] with the Wards [32] distance. The exception to this 
trend was the major change from the 1794 — 1818 — )■ 1826 — 1906 period, which may 
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Figure 4. Significance test performed for (a) tlie simplified silliouette and for 
(b) tlie Dunn Index. Tlie liistograms represent the values of the cluster quality 
indices considering a random distribution of points and the dotted lines represent the 
clustering quality indices obtained for the clustering illustrated in Figure[5] Because the 
silhouette for the random case SWCrand — 0.187± 0.036 is smaller than the silhouette 
SWC — 0.558 for the clustering of Figure [5j the clustering inferred is significant. The 
same applies for the Dunn index because DNrand = 0-059 < DN — 0.207. 



be the consequence of a drastic change in style triggered by the French Revolution 
(1789). As for the variance among clusters, the lowest and highest values applied to 
the 1590 — 1653 and 1906 — 1922 periods, respectively. These results are intuitive as 
little change in style could be expected in older periods, while in the recent periods less 
uniformity could be the result of the coexistence of many writing styles. 

The most important factors contributing to the separation of literary styles were 
determined in two distinct ways. The first technique considered a feature to be relevant 
if it was capable of providing significant distinction between groups, regardless of the 
other features. The list of metrics and the corresponding p-value for the difference of 
a given measurement between pairs of clusters are given in Table [3] The asymmetry 
in the distribution of the average shortest path length and the vocabulary size 
exhibited the most significant variations. Interestingly, similar results were reported 
in Ref. [12], where these two measurements were also useful to characterize personal 
writing styles. In the second evaluation, a feature was considered relevant if it was able 
to provide good distinction between groups based on the interdependencies of features. 
This evaluation was carried out by computing the importance of each measurement 
for the axes in the CVA plots. The results in Tables |4] and [5] point to the clustering 
coefficient (C and Cw) as the main factor for the distinction in 6 clusters. Since there is 
evidence that the clustering coefficient quantifies whether words are restricted to specific 
or generic contexts (an explanation of this property is given in Ref. [12j)[j]] it seems that 

II Context-specific restricted words are those appearing in only a few contexts. For example, the concept 
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Figure 5. Scatter plot representing tlie best clustering considering the writing style. 
Note that besides being a good partitioning scheme, it also keeps a good representation 
of the original database, since 82 % of the variance are kept in the CVA projection. 



the extent of use of generic or specific words varied along history. This change has not 
been monotonic, as indicated in Fig. [7]^a). In fact, most of the network measurements 
fluctuated over time, including the size of the vocabulary, whose considerable change was 
responsible for the most drastic transition, from the 1794— 1818 — ?■ 1826 — 1906 periods. 
This is clearly illustrated in Fig. [7](b). The only metric with a well-defined trend over 
time was the coefficient of the power law for the scale-free networks representing the 
texts. The decreasing trend in Fig. [7](c) points to a smoother, and therefore more 
uniform, frequency distribution, which means that the difference in frequency between 
low and high-frequency words decreased with time. 

The changes in style between any two consecutive clusters appeared to have been 
driven by opposition [IQ] (see Appendix A), which quantifies the extent into which the 
current period can be thought of as an opposite movement to the previous literary 
movements. The coefficient satisfies the inequality Wij > 0, with the exception of the 
1826 — 1906 — 7- 1909 — 1922 transition. Furthermore, the opposition movement was 
more significant than the skewness movement Sij (see Appendix A), which quantifies 
how much the change in the current style deviates from the opposition movement. The 

"teacher" usually induces concepts related to the learning environment. On the other hand, generic 
words may appear in a myriad of situations. Examples are "red" (red car, red wall or red skin) and 
"identical" (identical behaviors, identical grades or identical plates 
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Figure 6. Hierarchical relationship between literary periods using the Wards linkage 
strategy. The 2 groups after the division performed with a particular threshold (dotted 
line) corresponds to the oldest and to the newest books. 




55 4200- 



O 3800 




1826 

to to 
1906 1922 



Figure 7. Dynamics of (a) average clustering coefficient; (b) vocabulary size; and 
(c) coefficient of the power law. While the clustering coefficient and the vocabulary 
size oscillate throughout the periods, the coefficient of the power law tends to decrease, 
which shows that words were used in a more uniform way in the later periods. 



results are given in Table |6} In other words, the innovation of style (I^, see definition in 
Appendix A) was generally driven by contrasting the previous styles (at, see definition 
in Appendix A). As for the dialectics pijk (see Appendix A), which quantifies how the 
current movement i is an implication of the two previous movements j and /c, no clear 
pattern could be identified in Table [7| The lowest pijk (and therefore with the highest 
dialectics) appeared during the 19th century. Thus, realism is a literary style that better 
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Table 3. List of the most significant transitions. Taken individually, the most 
prominent measurements for discriminating between clusters are the size of the 
vocabulary N and the third moment of the average shortest path length <,{L). 



Measurement 


Feature 




Transition 




p-value 


Vocabulary 


A^ 


1590 - 


- 1653 




1794- 


- 1818 




0.048 




A^ 


1664 - 


- 1761 


— > 


1767- 


- 1793 




0.051 




A^ 


1664 - 


- 1761 


— 


1826 - 


- 1906 




0.001 




A^ 


1767- 


- 1793 


— 


1794- 


- 1818 




0.011 




N 


1794 - 


- 1818 


— 


1826 - 


- 1906 


< 


1.0 10-3 


• • 

Assortativity 


i 


ioyu - 


- 1653 


— )■ 


1767 - 


1 7QQ 

- 1 ( yo 




0.008 




p 
i 


ioyu - 


- 1653 


— )■ 


1826 - 


- lyuo 




0.044 




i 




- 1761 


— > 


1767 - 


- 1 ( yo 




0.041 




p 


1 «RA 
10041 - 


- 1761 


— > 


1826 - 


- lyuo 




0.006 


Shortest Patli 


(0 


1664 - 


- 1761 


— > 


1826 - 


- 1906 




0.049 






1664 - 


- 1761 




1906 - 


- 1922 




0.050 




AL 


1590 - 


- 1653 




1906 - 


- 1922 




0.031 




AL 


1664 - 


- 1761 


— 


1906 - 


- 1922 




0.022 




A T 


i ( ( - 


- 1793 


— > 


1906 - 


1 Q99 

- lyzz 




0.023 




A T 


ioZO - 


- 1906 




1906 - 


^ Q99 
- lyzz 


< 


1.0 10-3 




r(l\ 


ioyu - 


- 1653 




1826 - 


- lyuo 




0.028 






ioyu - 


- 1653 


— )■ 


1906 - 


1 Q99 

- lyzz 


< 


1.0 10-3 






1004 - 


- 1761 


— )■ 


1906 - 


1 Q99 

- lyzz 


< 


1.0 10-3 




r(l\ 


1 ( O ( - 


- 1793 


— )■ 


1906 - 


1 Q99 
lyzz 




0.001 






1794 - 


- 1818 


— > 


1906 - 


- 1922 




0.019 






1826 - 


- 1906 


— > 


1906 - 


- 1922 


< 


1.0 10-3 


Clustering 


{C) 


1664 - 


- 1761 


— > 


1767- 


- 1793 




0.048 




(C) 


1664 - 


- 1761 




1826 - 


- 1906 




0.051 






1664 - 


- 1761 


— )■ 


1767- 


- 1793 




0.054 




(a) 


1664 - 


- 1761 


— )■ 


1826 - 


- 1906 




0.055 




AC 


1664 - 


- 1761 


— )■ 


1767- 


- 1793 




0.054 






1590 - 


- 1653 


— > 


1767- 


- 1793 




0.045 



approximates as a synthesis of the two previous literary periods. 

In subsidiary studies we verified that the complex network metrics used are indeed 
efficient in distinguishing styles. For that we examined the writing style dynamics of 10 
book^of Charles R. Darwin (1809-1882) and Edith Wharton (1862-1937), whose styles 
are known to differ considerably. Indeed, this is conffrmed in the CVA plot in Fig. [8} 

% The list of books is shown in Table S3 in (SI)-Sec.2. 
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Table 4. Importance of each measurement for the first canonical variable, where the 
clustering coefficient C and the average shortest path length I were the most prominent. 



Measurement 


Prominence 


(First Axis) 


(First Axis) 


(a) 


33.3 % 


(C) 


31.6 % 


AC 


6.6 % 


(/) 


6.4 % 


r 


5.1 % 



Table 5. Importance of each measurement for the second canonical variable, where the 
clustering coefficient C and the average shortest path length I were the most prominent. 



Measurement 


Prominence 


(Second Axis) 


(Second Axis) 


(C) 


34.5 % 




33.7 % 


{Q 


9.5 % 


(0 


9.4 % 


AC 


3.4 % 



Table 6. Opposition {Wij) and skewness (s) indices. 

Period Wij Sij 

1590 - 1653 ^ 1664 - 1761 1.00 0.00 

1664 - 1761 ^ 1767 - 1793 0.39 0.08 

1767 - 1793 ^ 1794 - 1818 0.35 0.18 

1794 - 1818 1826 - 1906 1.09 0.07 

1826 - 1906 ^ 1909 - 1922 -0.01 0.08 



where again the most contributing factor for distinction was the clustering coefficient 
C, since both (C) and (C„,) are responsible for 44 % of the weights in the first canonical 
variable axis. 

5. Conclusion and further work 

Changes in the writing style could be studied objectively by analyzing the metrics 
from complex networks representing texts from books published over several centuries. 
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Table 7. Counter Dialectics index pik- 



Period 



1590 - 


1653 ^ 1664 - 


1761 ^ 1767 - 


1793 


0.76 


1664- 


1761 ^ 1767 - 


1793 ^ 1794 - 


1818 


1.49 


1767- 


1793 -> 1794 - 


1818 ^ 1826 - 


1906 


0.39 


1794- 


1818 ^ 1826 - 


1906 ^ 1909 - 


1922 


0.69 



0.20t 
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m 
< 
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o 
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O -0.05- 



O -0.10- 
O 
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-0.204 



O 



O 



^CHARLES DARWIN 
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-0.08 -0.04 0.00 0.04 0.08 
FIRST CANONICAL VARIABLE 



Figure 8. Comparing Darwin's and Edith Warthon's styles with CVA projection. A 
good separation can be observed indicating that these two authors had quite different 
styles. 



Significantly, the most appropriate clustering of books matclied tlie traditional literary 
classification, with the most contributing factor for distinguishability being the average 
shortest path length. We found it to be possible to distinguish literary movements 
using only the vocabulary size or the asymmetry of the average shortest path length 
distribution. Innovation in writing style was found to be driven mainly by opposition, 
with growing trend of literary development toward counter-dialectics. Interestingly, 
these findings represent the generalization of previous results where a dependence was 
established between network topology and style of machine translations [lOl HI] and 
style of authors [1^. We believe that the approach used here may be useful to study 
the evolution of any system of interest, since the basic concepts (i.e. characterization 
through features and use of time series) are completely generic. 

As future work, we plan to employ additional complex network measurements in a 
larger database to verify if the discrimination can be further improved. We shall also 
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examine the relationship between semantics and topology, by generating clusters using 
the semantics of words to be compared with the clusters obtained from the analysis 
of network topology. A more challenging endeavor will be to extend the study to 
other languages, in order to probe whether the patterns revealed in this paper can 
be generalized. 
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Appendix A - Mathematical quantification of writing style 

In this appendix we quantify mathematically the variation of writing style. To quantify 
the change in style over time, we used three concepts, namely opposition index, skewness 
index and counter- dialectics index, which depend on the measurements computed in each 
step of the temporal series. For each element i of the temporal series, which represents 



the value for the measurements described in Sec. 2.2, we defined the 11-dimensional 
vector vt: 

vt=[Nr^ (C) (a) AC ^(C) (/) {Q Al ^{l) (6) 

The large amount of data generated were visualized by projecting vt into a 
two dimensional space before computing the indices, and this also helped to remove 
undesirable correlations. The projection techniques employed are described in (Sl)- 
Sec.3. Using the projected vt, and considering t elements in the time series, at was 
defined the average state at time i, i <t as: 



1 



at = -E^- (7) 

Given of, the opposite state of the current state i (see Fig. [oj^a)) for a geometrical 
interpretation) is given by: 

7t = 4 + 2(at-^) = 2at-^, (8) 
and given ft and 1)1, the opposition vector Di of state 1)1 (see Fig. [9|a) is given by: 

^i=rt-vt. (9) 

For two consecutive books i and j, the vector representing the style change Mij (see 
Fig. |9|:a)) is: 

M!j=l^-vt. (10) 

The vector Mij is important because its norm ||Mjj|| quantifies the change in style in 
relation to the previous state vt- With Mij, the opposition index Wij is the component 
of Mij over Df. 



W, = ^ (111 



Di 



|2 



If the current style tends to oppose the previous one, then the component of Mj,- 
— )■ 

over Di will have a high value. This quantifier is useful, for example, to identify little 
stylistic innovation: if opposite movements are repeated over and over again, then there 
is no innovation at all. 

The skewness index Sij, which is depicted in Fig. [9|^a), is defined as the distance 
between ifj and the line defined by Di. This index quantifies how far the stylistic 
movement is from the opposite movement. It is useful to identify trivial oscillations 
within the line Li, for in this case a series of movements with zero skewness index would 
be observed. 
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V,- 




(a) 




(b) 



Figure 9. Illustration of the quantities employed to define the opposition^ skewness 
and counter- dialectics indices. 



The dialectics between three consecutive styles i, j = i + l and fc = j + l = z + 2in 
the temporal series was quantified as follows. If vt is the outcome of a synthesis of the 
styles represented by vt and Vj, then the distance dik between vt and the middle line 
MLij defined by vt and Vj (see Fig. [9]^a)) will be small. The counter dialectics m(ieaQ 
Pik is: 

dik 



Pik 



IM,;, 



(12) 



Further details regarding the definition of the opposition Wij, sknewness Sij and 
counter-dialetics pik are given in Ref. p[0] . 



Note that we referred to pik as counter dialectics index instead of dialectics index because it is defined 
as a distance. Hence, there is an inverse proportion between p^^. and the concept of dialectics. 
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